In this thesis, I address the problem of automatically acquiring lexicalsemantic knowledge, especially that of case frame patterns, from large corpusdata and using the acquired knowledge in structural disambiguation. Theapproach I adopt has the following characteristics: (1) dividing the probleminto three subproblems: case slot generalization, case dependency learning, andword clustering (thesaurus construction). (2) viewing each subproblem as thatof statistical estimation and defining probability models for each subproblem,(3) adopting the Minimum Description Length (MDL) principle as learningstrategy, (4) employing efficient learning algorithms, and (5) viewing thedisambiguation problem as that of statistical prediction. Major contributionsof this thesis include: (1) formalization of the lexical knowledge acquisitionproblem, (2) development of a number of learning methods for lexical knowledgeacquisition, and (3) development of a high-performance disambiguation method.
展开▼